AITopics | positional vector

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Neural Information Processing SystemsMar-18-2026, 07:52:51 GMT

Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can effectively extend the context window length.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.98)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

1403ab1a427050538ec59c7f570aec8b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 06:05:50 GMT

context window, positional information, positional vector, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada (0.04)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Neural Information Processing SystemsOct-9-2025, 19:01:50 GMT

We propose two training-free context window extension methods via the lens of adjusting positional vectors, i.e., positional vector replacement and attention window extension.

context window, positional information, positional vector, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada (0.04)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks

Sabbaghi, Mahdi, Pappas, George, Hassani, Hamed, Goel, Surbhi

arXiv.org Machine LearningJun-3-2024

Despite the success of Transformers on language understanding, code generation, and logical reasoning, they still fail to generalize over length on basic arithmetic tasks such as addition and multiplication. A major reason behind this failure is the vast difference in structure between numbers and text; For example, the numbers are typically parsed from right to left, and there is a correspondence between digits at the same position across different numbers. In contrast, for text, such symmetries are quite unnatural. In this work, we propose to encode these semantics explicitly into the model via modified number formatting and custom positional encodings. Empirically, our method allows a Transformer trained on numbers with at most 5-digits for addition and multiplication to generalize up to 50-digit numbers, without using additional data for longer sequences. We further demonstrate that traditional absolute positional encodings (APE) fail to generalize to longer sequences, even when trained with augmented data that captures task symmetries. To elucidate the importance of explicitly encoding structure, we prove that explicit incorporation of structure via positional encodings is necessary for out-of-distribution generalization. Finally, we pinpoint other challenges inherent to length generalization beyond capturing symmetries, in particular complexity of the underlying task, and propose changes in the training distribution to address them.

generalization, positional vector, sequence, (16 more...)

arXiv.org Machine Learning

2406.01895

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Dong, Zican, Li, Junyi, Men, Xin, Zhao, Wayne Xin, Wang, Bingbing, Tian, Zhen, Chen, Weipeng, Wen, Ji-Rong

arXiv.org Artificial IntelligenceMay-28-2024

Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can effectively extend the context window length.

context window, positional information, positional vector, (10 more...)

arXiv.org Artificial Intelligence

2405.18009

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Illustrated Guide to Transformer

#artificialintelligenceJul-6-2022, 21:05:14 GMT

The Transformer model is the evolution of the encoder-decoder architecture, proposed in the paper Attention is All You Need. While encoder-decoder architecture has been relying on recurrent neural networks (RNNs) to extract sequential information, the Transformer doesn't use RNN. Transformer based models have primarily replaced LSTM, and it has been proved to be superior in quality for many sequence-to-sequence problems. Transformer relies entirely on Attention mechanisms to boost its speed by being parallelizable. It has produced state-of-the-art performance in machine translation.

attention vector, english sentence, vector, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Illustrated Guide to Transformer

#artificialintelligenceJun-1-2020, 03:16:32 GMT

For example, in machine translation, the input is an English sentence, and the output is the French translation. The Encoder will unroll each word in sequence and forms a fixed-length vector representation of the input English sentence. Then, the Decode will take the fixed-length vector representation as input, and produce each French word one after another, forming the translated English sentence. However, RNN models have some problems, they are slow to train, and they can't deal with long sequences. The input data needs to be processed sequentially one after the other.

artificial intelligence, machine learning, vector, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Filters

Collaborating Authors

positional vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

1403ab1a427050538ec59c7f570aec8b-Paper-Conference.pdf

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Illustrated Guide to Transformer

Illustrated Guide to Transformer